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1.0 SUMMARY 


This document describes the design and usage of two main subprograms using direct 
methods to solve large linear complex systems of the form Ax = b, whose coefficient 
matrices are too large to be stored in core. In the first main subprobram, the basic idea is 
to reduce the matrix to an upper triangular matrix and subsequently solve the problem 
by backward substitution. A row interchanging strategy called “threshold pivoting” is 
adopted to preserve numerical stability and to minimize disk-transfers. This algorithm does 
not yield the usual LU factorization of some row permutation of the coefficient matrix. 

The second main subprogram is designed for linear systems with a certain sparse structure, 
namely, the matrix can be written as B + D where B is a block-banded matrix, and D has 
only a few columns of nonzeros. A variant of the Sherman-Morrison updating formula is 
employed in this case. The second main subprogram calls the first main subprogram to solve 
l^(x,y) = (b,u) where x,y are unknowns, b is the right-hand side of the linear system 
(B + D)x = b, and U is the nonzero columns of D. 



2.0 INTRODUCTION 


This document describes the design and usage of two FORTRAN main subprograms for the 
CDC digital computer systems, namely subroutine ETCGP and subroutine ETCSM. These 
subroutines were written as part of a research effort investigating unsteady transonic flow. 
References 1 through 5 present a procedure for analyzing the flow about harmonically 
oscillating airfoils and wings in the transonic regime. This procedure is based on small per- 
turbations. A solution is formulated using finite difference techniques which results in a 
large set of simultaneous equations, written matrix form as 


Ax = b 


( 1 ) 


where A is a sparse complex matrix of order equal to the number of mesh points. The 
matrix b is a complex matrix with the number of columns equal to the number of modes for 
which pressure distributions are to be found. The matrix A may be of order 3,500 for a 
practical two-dimensional airfoil. 

Numerical solutions to equation (1) were initially obtained using relaxation techniques. 
However, for a significant range of practical values of Mach number and reduced frequency, 
experience showed that relaxation techniques applied to equation (1) failed to converge. 

The matrix A in equation (1) does not possess any of well-known properties (e.g., positive 
definiteness, diagonal dominance) which guarantee the convergence of relaxation methods. 
However, the physical origin of equation (1) guarantees the existence of a unique solution. 
Thus the obvious alternative is a direct solution method, which assumes no properties of 
the matrix A other than existence of a unique solution for an equation of the form of (1) 
when any arbitray matrix b is applied. 

Out-of-core algorithms have been discussed in standard textbooks of numerical analysis 
(e.g. Reference 6). These algorithms are numerically as stable as the well-known regular 
Gaussian elimination, yet they are not designed for efficient execution in modern day 
computers. The physical origin of equation (1) makes “blocking” of the matrix A an 
obvious and attractive data structure. Yet existing “blocked linear equation solvers” at best 
assume non-singularity and the well-conditioning of the i-th diagonal sub-block at the i-th 
blocked pivotal step (e.g. Reference 7). This property is unnatural and is not based on the 
physics of the problem. Out basic algorithm does not require the matrix A to have this 
particular property. As stated in the above paragraph, the only property of A it assumes 
is non-singularity of the matrix A. It also takes into consideration numerical stability and 
computational efficiency. Sections 43 and 44 discuss these advantages in detail. 

The FORTRAN subroutines described here are applicable to both dense and sparse matrices 
that are too large to be stored in core. Thus they are useful for solving any equation of the 
form of equation (1), in particular in the case when there is no useful properties of the 
matrix A to guarantee convergence of any iterative methods. For example, subroutine 
ETCGP ( as well as the version for real matrices) has been used in different research and 
production computer codes at the Boeing Company. 



Results of applying the routines of this document together with th program described in 
reference 5 are presented in reference 4. The development of the routines of this document 
was in conjunction with work described in references 4, 5, and 8. 

The first step in developing the algorithm for solving equation (1) is to partition the 
augmented matrix [A|B] into blocks so that it can be considered to have the following 
structure: 



Aj2- . 

• ^IS 

Bl 

^21 

^22- 

• ^2S 

B2 

* 

_^S1 


^SS 
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Figure 1. - Data Structure 

Each block is stored as a record in mass storage with the requirement that at least three 
blocks can be held in core simultaneously, and the diagonal blocks are square blocks. The 
mass storage used should be a random access file (called direct access file in the IBM nomen- 
clature). 

The methods used in both main subprograms are direct methods. In the first main sub- 
program, the basic idea is to reduce the coefficient matrix to an upper triangular system. 

In the second main subprogram, the coefficient matrix is assumed to have the for B + D, 
where B is a block-banded matrix, and D is a matrix with only a few columns of nonzeros. 

A variant of the Sherman-Morrison updating formula is employed. 

Features of the two subprograms include options to 

• Solve more than one set of right-hand sides 

• Control the frequency of pivoting 

• Access the submoduli of the main subprograms 

• Take advantage of the block sparsity of the coefficient matrix. 

In section 3.0 we shall list our symbols and nomenclatures. In section 4.0, we shall discuss 
the algorithms in detail, analyze their numerical stabilities, remark on their FORTRAM 
implementation and compare our approaches with outer approaches outlined in the liter- 
ature. In secion 5.0, we shall discuss in detail the usage of the two subprograms. 



3.0 NOMENCLATURE 


A = (ai^j) 


A is a matrix, the entry of the i-th row and j-th column is 
denoted by j 


r 


A is a matrix partitioned into blocks, the block in the i-th 
block row and the j-th block column is denoted by Aj^ j 


.T 


The transpose of A 


Upper triangular Matrix with all zeros below the diagonal 

matrix 


Block profile Matrix of the form ; 

matrix 



(where the shaded area indicates non-zero entries) 
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Block-Banded 

matrix 


Matrix of the form: 



m 

■ 

■ 

■ 

■ 


m 


9 

■ 

■ 


m 


m 

■ 

■ 

■ 

■ 


m 

m 

■ 

■ 

■ 

■ 



M 

■ 

■ 

■ 



(Where the shaded area indicates non-zero entries) 


Threshold pivoting 
and pivotal 
tolerance 


Threshold pivoting is a row interchanging strategy controlled 
by a pivotal tolerance parameter p. The ju parameter is a 
real number such that 0 < ju < 1 . If A = ^aj j j , and 


a: ; > max. 

’ j>i 


aj j , then there is no row interchanging, 


otherwise interchange rows i and m where m = max . 

j>i 


y 


Random Access file Multi-record mass storage input/output file which allows the 

user to create, access and modify its records on a random 
basis without regard for their physical positions or internal 
structures. (They are called direct access files in IBM 
nomenclature.) 


Sherman-Morrison Updating Formula for 


A’^ = B'^ 


A = B + UvT; 

- B" * U (14 + 1 u) ■ ^ V'^B" 1 


5 


Matrix norms: HA|| - max. | S jay 

i ' j 


||A||2 = max. | (s a^y ) 

Condition Number of a nonsingular matrix 

Km(A) = NL-||A''lU 

Growth (growth factor) The largest absolute value of the numbers generated by the 

of a reduction reduction process 

process 



o= max. I S ay 
■ j' ^ i 
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4.0 DISCUSSION 


In this section, we highlight the special features of our algorithms. In sections 4.1 to 4.4, 
we shall discuss Algorithm I, which reduces the coefficient matrix to an upper triangular 
matrix and solves the problem by backward substitution. Then we shall remark on 
FORTRAN implementation and compare the efficiency of this algorithm with the ap- 
proaches outlined by Bjorck and Dahquist (ref. 6), and Reid, (private communication). 

In sections 4.5 and 4.7, we shall discuss Algorithm II, which employs a variant of the 
Sherman-Morrison updating formula. Then we shall remark on its FORTRAN implementa- 
tion, and the numerical stability of our particular application of the Sherman-Morrison 
updating formula. 

4.1 ALGORITHM I (SUBROUTINE ETCGP) 

For the convenience of discussion, we shall firstly assume the coefficient matrix A to be 
dense. We shall show how the algorithm can be modified for a block-profile system. 

We shall first illustrate the algorithm with a 3 x 3 block system, and assume no row inter- 
changing is necessary. In this case, the algorithm yields Grout’s LU decomposition. The LU 
factors can replace the original coefficient matrix A on disk. Assume, graphically, A is 
stored on disk as in the following figure. 



Figure 2.— Original Matrix 

In the first block pivotal step, we form the LU factorization ofAjj.Ajj^ = LjU^ and 
replace Aj ][,Aj 2 ,Ai 3 >A 2 i,A 32 as in figure 3. 
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LiU, 

Li->Ai2 


AjiUfl 

^22 

^23 

A3,Uf> 

■ ^32 

^33 


Figure 3.— After First Block Pivotal Step 


For i = 2, 3, write Lj Ajj as U^j, and Uj ^A|jasLjj. 

In the second block pivotal step, replace A 22 with A 22 ■ ^21^12’ decompose the 
resultant into its LU factors, and complete the rest of the second block pivotal step as 
illustrated in fig. 4. 


LiUj. 

U12 

Ui 3 

L 2 I 

^22 

^23 

L 3 I 

^32 

^33 


LiUi 

U 12 

Ui3 

^21 

(l-2l’2) 

L2’^(^23" ^ 21 ^ 13 ) 


^22"L2iUi2 


L 3 I 

U2'^(^32‘ ^1^^^) 

^33 


Figure 4.— Second Block Pivotal Step 


Write L 2 (A 23 -L 2 iUi 3 ) as U 23 , and U2(A32-L3iUj2> ^32‘ pivotal step, 

we replace A 33 with A33 - L 3 iU ^3 -L 32 U 23 and decomplse the resultant to its LU 
factors. The third block pivotal step can be illustrated by fig. 5. 





LiUi 

U 12 


LiUj 

U 12 

Ui3 





L 2 I 

1 - 2^2 

U 23 

— 

L 2 I 

L 2 U 2 


L 3 I 

L 32 

"^33 








L 3 I 

L 32 







^33 




Figure 5.- 

-Third Block Pivotal Step 



U 


13 


U, 


23 


(L3U3) 




% 
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Thus we have reduced the matrix A to an upper triangular matrix, and thus equation (1) can 
be solved by backward substitution. 

Now we shall explain our row interchanging strategy, which is commonly known as 
threshold pivoting. A real number ju is chosen so the 0 < /x < 1 . We perform row inter- 
changing only when |a|j| < p • Max. |ajj| . 

j>i 

Note that if ^ = 0, there is no row interchanging at all; if /x = 1 , we have the regular row 
interchanging. We shall illustrate the application of this row interchanging strategy to a 
2x2 block system : 




uii uj2 

Ul' 


0 ^22 

^21_ 


^31 ^32 



_f41 ^42_ 


Figure 6. —First Pivotal Step of a 2x2 Block System 

In the first pivotal step, we first decompose Aj j into its LU factors with pivoting, i.e., 
Aj j = LjUjPj, then replace Aj 2 with Li~^P]Aj 2 - 

^12 
"22 

uii|>/x-max. {|a3il , |a4i| | = hij 


Let Uj 2 “ Lj”*Pj Aj 2 - Now suppose 


U = 


^11 




We have to interchange row 1 of Uj with row 1 of A 21 , then eliminate uj j and a 4 j with 
a 3 j as the pivot, the resultant is as follows: 

^31 ^32 

^ ^22 

0 uj2 

0 34^ 
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Nowmppose . M) 

We have to ipterchange tow 2 with row 2 o£ Ajl, and eliminate u ,2 and ujj “42 ““ 


pivot, the resultant is: 


^31 

^32 

0 

H2 

0 

0 

0 

0 _ 


After the first block pivotal step, the disk storage should be as follows: 


LiUi 

LiPiAn 

Multipliers and 

^22 

pivoting information 

— 


(In the formal description of our algorithm, we shall write as ^\ 2 ’ *he disk 

record that stores the multipliers and pivoting information as indicated in the previous 

diagram L 2 p) 


The following is a formal description of Algorithm I: 


Algorithm I 

Let A be an n X n block system 
R)r i = 1 step 1 until n do 

For j = i step 1 until n do 
Read A^j into core 

For k = 1 step 1 until i-1 do 
Read Ljj^, Uj^j into core 


Modify 


Ukj 

lAjj 


by repeating the row operations done to 


uti 

L^kil 


If there is row interchanging between 


Uk 

^ki 


, rewrite Ujg on disk. 
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Enddo 


Ifj=i 

Aji = LjjUjP^ 

Else 

Endif 

RewriteAy (rename A^j as Uy) 

Enddo 

For j = 1 step 1 until n do 
Read Ay into core 
For k = 1 step 1 until i - 1 do 
Read Ljj^, into core 


Modify 


Uki 


Ji 


by repeating the row operations done to 


If there is row interchanging between 


Ut 


A: 


jk 


Ut 


"jk 


, rewrite Uj^j on disk. 


Enddo 

Eliminate Ajj from 


Ui 

Aji 


with row interchanging if necessary. 


Store multipliers and pivoting information as Ljj on disk. 


Enddo 

Enddo 


Repeat the same row operation to the right-hand side, and solve for x by backward sub- 
stitution. 

Note: 

1 . If there is no interblock pivoting, it requires two disk read/write’s in the innermost 
do-loop; otherwise, it requires three. 

2. Nowhere in the algorithm do we assume non-zeros in the diagonal blocks. Although 
nonzeros in the final upper triangular form will indicate matrix singularity, and back- 
ward substitution will be impossible to carry out. 


11 



Now we shall show how the algorithm can be modified for a block-profile system. 
Define an n x n matrix K such that; 

K(i, j)= 0 indicates the (i, j)block is a zero block 

K(i, j)/= 0 indicates K(i, j)is the location of the (i, j)block (a nonzero block) on disk. 
For example, for the following block-profile matrix, 



Where the shaded area indicates the nonzero blocks, its corresponding K matrix can be 

1 ., 2 - 

3 

4 5 

6 7 8 9 

10 11 

We call the K matrix the profile matrix of the matrix A. 


In subroutine ETCGP, we scan the profile matrix to determine the beginning and end of 
each block row and block column, and change the “ends” the do-loops in the algorithm 
accordingly. To handle the ‘'fill-in ’s”ot the zero blocks (i.e., if the (i, j)block is originally 
a zero block, but the reduction process turns it into a nonzero block), we find the maximum 
entry of the original K matrix, m, and set K(i, j)= m + 1, update m by adding 1 to it, and 
revise the information on the beginnings and ends of the i-th block row and j-th block 
column if necessary. 


4.2 REMARKS ON FORTRAN IMPLEMENTATION OF ALGORITHM I 


1 . Storage of the multipliers and pivoting information 

Suppose the i, j block is a nonzero block that is njxnj, and i>j. The number of words 
in its corresponding record on disk is 2*(nj*nj)+nj+l ' Before the j-th block pivotal step, the 
first 2*(n^*nj) words stored the entries of the (i,j) block, and the last nj+l words are zeros. 
After the j-th block pivotal step, the first 2*(nj*nj) words store the multipliers used, and 
the last nj+l words store the pivoting information. We can consider this record as consist- 
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ing of two separate arrays; an array of complex numbers, A(nj,nj), and an array of integers, 
I(nj+1). In the 2x2 block system we used to illustrate the threshold pivoting discussion, 
after the first pivotal step, the (2,1) block of fig 6., that is A21, is stored as an array of 
complex numbers, A(2,2), as 

Uii/asi 

341/331 

U 12/342 
U22/342 

and the pivotal inforrhation is stored in the integer array, 1(3), as 

1 

2 

3 

The two arrays are packed as one record on disk. 

The matrix A stores the actual multipliers used in the reduction process, 1(1 )=1 indicates 
the first row of A21 interchanged with row 1 of the upper triangular matrix Uj before the 
first column of A21 is eliminated. I(2)=2 indicates the second row of A21 interchanged with 
row 2 of Uj before the second column of A21 is eliminated. I(3)=I(1)+I(2)X) indicates 
there is interblock pivoting between the first block row (the pivotal block row) and the 
second block row. If there is no interblock pivoting in this step, 1(1 )=0, and I(2)=0, thus 
I(3)=0. As it is obvious from this example, l(k)=0 indicates that no row interchange is 
needed to eliminate the k-th column of the (i^j) block. I(k)=m>0 indicates that the m-th 
row of the (ij) block is interchanged with the k-th row of the corresponding upper triang- 
ular matrix Uj before the k-th column of the (i, j)block is eliminated, and 
I(nj+l)=I(l)+I(2)+. . .+I(nj). If I(nj+I)=0, it indicates that there is no interblock pivoting 
between the i-th and j-th rows; if I(nj+1 )>0,it indicates that there is interblock pivoting 
between the i-th and j-th block row. 

2. Optimization of the Innermost Loop 

As indicated by the 3x3 block system in section 4.1 , if there is no interblock pivoting, the 
most expensive operation is the matrix operation 
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Ay Ay - Ljk • Uj^j 

In subroutine ETCGP, this operation is carried out by a Compass subroutine CCSMAB 
which is about five times faster than straight FORTRAN. Even if there is interblock row 
pivoting, we still can take advantage of CCSMAB. Consider the situation in the innermost 
do-loop in the description of Algorithm I. We have the following blocks in core: 


0 Ukj 

^ik •'*^ij 


If there is no pivoting between the i-th and j-th block row, the operation can be repre- 
sented by the following equation: 


where L is such that 


Uk; 


Uk7 



_u_ 

[T 

51 



Mk ^ 


If there is pivoting between the i-th and j-th block rows, then the operation can be repre- 
sented by the following equation: 

I 

tP' 

where P is that permutation matrix that interchanges the rows and L is such that 


Ukj 

II 

A 

Ukj 

^j 




A ^ 

L =H-P 



0 

I 
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We can write 


Therefore 


A-1 

L = 


M 


A 

L = 


-1 


-ML 


-1 




Also write 


Therefore 


Ukj 



A;; 


A;; 

L 


_ IJ 

r“ • 

1 



A 



V'Ukj ‘ 

L 


= 







Thus in ETCGP, we. compute tij^j = then use CCSMAB to compute Ay-MUj^j 


Note that M is just the lower block of P 


L; 


ik 


and can be computed quite efficiently by 


forward substitution. (The following describe the compution M and U = . 

M would overwrite Ljj^, and U would over write Uj^j. We shall call Ljj^, M and Uj^.j, U and 
the actual row and column dimension of U, NR, and NC.) 


For m=2 step 1 until NR do 

If l(m)>0 
k=I(m) 

For h=l step 1 until NC do 

m- 1 


U(m, h) = U(m, h) - 
Enddo 


M(k, s)*U(s, j) 

s = 1 
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For h=l step 1 until m-1 do 
M(k,h)=0 
Enddo 
Else 

(no operation) 

Endif 

Enddo 

3. Pivotal Tolerance 

At this point, we cannot give any theoretical guidance on the choice of the pivotal tolerance 
p. Experimaentally, p = 0.001 proves to be satisfactory. In ETCGP, we store p in a labelled 

common block 

COMMON/ETPIVOT/U 

and U is set to 0.001 by at DATA statement. The user can reset the value of U by any ex- 
ecutable statement before calling ETCGP. 

4. Monitoring of Stability 

ETCGP keeps track of the ’‘growth ” of the reduction process (which is the largest absolute 
value of all the numbers generated by the reduction process). Our motivation for keeping 
track of the growth is explained in section 4.3. 

ETCGP also generates an extra right-hand side row which is the row sum of the coefficient 
matrix A. It subtracts (1.0,0) from each element , in the solution of this extra right-hand side. 
The resultant gives some indication of the “smallness’ of the residue Ax-b of the actual 
problem. 

4.3 NUMERICAL STABILITY OF ALGORITHM I 

Reid (ref. 9) and Wilkinson (ref. 10) have analyzed the stability of the regular Gaussian 
elimination. If L and U are respectively computed lower and upper triangular factors of A, 
the A, L and U are related asfollows: 

A=LU + E . 

with E=(ej> j)and |ej, jK 3.01 * m* e*g 

where m is the order of A, and e is the machine precision, (in CDC equipment, e = 10" ), 

g is the growth of the reduction process. 
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Reid and Wilkinson analysis cannot be extended in any obvious manner to give a practical 
bound for E for Algorithm I of this document. The best we can do at present is the 


following: 


. . . M “^U + M, 


. . . .M. 


+ ]vriEi 


where Mj is the matrix that performs the ith block pivotal step 

^i " (®k/) l®k/| * e * gj with gj being the growth of 

the first ith block pivotal step. 


The main difficulty is due to the fact that we do not use the same pivots for eliminating 
the subdiagonal elements in the same column. Thus 

MflM 2 ~^.. .Mj'^Ej ^Ej for i=l,2, ...,M 

whilst equality holds for the above expression in the regular Gaussian elimination. However 
each block pivotal step the process of eliminating the lower triangular elements of A|, j from 



for i < j < n are regular Gaussian elimination. Thus the "growth” of Algorithm I at least 
gives the local stability of these substeps. The norm of the difference of the computed solu- 
tion from the actual solution give a realistic bound for the norm of E, because the actual 
solution X satisfies the equation Ax = b, the computed solution y satisfies the equation 
(A + E)y=b. 

Thus 

Ax = (A + E)y 
A(x - y) = Ey 


l|Ey||<||A 2 ll llx-y|| 

In all our test problems 

||x-y||= 10-11 
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4.4 COMPARISON OF ALGORITHM I WITH OTHER APPROACHES 


Bjorck and Dahlquist (ref. 6) and Reid (ref. 8) have proposed approaches to the problem of 
solutions of large dense linear systems. Bjorck and Dahlquist’s approach follows. 

The matrix is partitioned into block rows: 



with the requirement that a minimum of two block rows can be held in core simultaneously. 
The first block pivotal step can be described as follows: 

1 . Read A| into core and reduce it to N^ ^A] with regular Gaussian elimination. 


2. For j=2,3. . .,n read Aj into core reduce 



with row-interchanges when necessary, and write Aj back onto disk. 


3. Write Aj back onto disk. 

The other pivotal steps are similar. 


Reid proposed the same data structure as we do. His first block pivotal substeps consists of 
the following substeps to be performed for i=2,3,. . .,n. 


1 . 


Read into core the blocks 



and apply normal Gaussian elimination, overwriting 
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eliminated elements by multipliers and using row interchanges. Note that for i > 3 
the modified block Aj ^ is upper triangular and advantage may be taken of this. 


■lA 


2. For j=2,3. . .n read in blocks ( *■’ ) modify them using the multipliers and interchanges 

^i/ 


held in 


41 


Ai 


and then write them back to disk. 


il 


3. Write modified block Ajj to disk. 

The remaining block pivotal steps are performed similarly. Operations on the right-hand 
side vector may be performed subsequently using the stored multipliers or at the same 
time as the elimination. 


Reid has proved that given the same amount of core, the block row storage scheme is only 
efficient for matrices of order of less than 300. For large matrices, Algorithm I requires the 
least amount of disk input/output. The following table summarizes the features of the three 
approaches. 


Approaches 

Features 

No. of 
disk access 

Dahlquist 
and Bjorck 

two block rows are needed in core 

m^ 

V 

Reid 

four blocks are needed in core 
four disk read/write’s in the 
inner most do-loop 

32/ m^ 

3 \v/n/ 

Algorithm I 
(without pivot- 
ing) 

three blocks are needed in core 
two disk read/write’s in the 
inner most do-loop 


Algorithm I 

(with the worst 
possible case of 
pivoting) 

three disk read/ write in the inner 
most do-loop 



* m is the order of the matrix, and N is one half of the number of words 
available in core. 


19 



















4.5 ALGORITHM II (SUBROUTINE ETCSM) 


Algorithm II is designed for coefficient matrix with a special sparsity structure : 


A = 


X X 
XXX 
XXX 
XXX 
X X X X 
X XXX 
X X X X X 

X X X X 

XX X 
X X 



We can write A = B + D where B is a banded matrix; 


and D = A -B : 


B = 


X X 
XXX 
XXX 
XXX 
XXX 
XXX 
XXX 
XXX 
XXX 
X X 



*The x’s are single elements 
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nr TT 

D can be written as D = UV^ , where U and V are as follows: 


U = 


0 o 
o o 
0 o 
o o 
X o 
X X 


X X 


T foolooooooo 
= 1 

^ oooolooooo 


Thus to solve a x = b, we apply the Sherman-Morrison updating formula 

x = A'lb = (b + UV'^) 


The algorithm II can be described as follows: 

1 . Reduce B to an upper triangular matrix with Algorithm I. 

2. Repeat the row operation in 1 to [b[U] . Solve for B [z,y] = [b,U] by backward 
substitution. 

3. Compute [S,T] = [z,y] . 

4. Decompose (I+T) into its LU factors with row interchanging. 

5. Solve for w in the equation (I-T) w=S by forward and substitution. 

6. Compute x=S-T*x. 
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4.6 REMARKS ON FORTRAN IMPLEMENTATION OF ALGORITHM II 


In this section, we shall use the notation we have defined so far. 


1 . Storage of U and V 

We partitioned U comformable to the partitioning of the coefficient matrix A: 



U = 



and store them on disk in the same block which stores the corresponding partition of the 
right-hand side. 



u7 

^2 

U2 


Un 


Since the matrix V only consists of ones and zeros, we do not store V explicitly. 


We use two-dimensional array: 

INTEGER N(MR,MC) 

where MR > the number of block columns that contain element columns in the matrix 
D=UV^, and MC > 2 + the total number of columns in the matrix U. The contents of N are 
defined as follows: 

Let i]^ ,i2>- • • • the indices of the blocks that contains columns of D, and then 
N(l,l)=ii 
N(2,l)=i2 
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Suppose for j=l ,2. . . all (ij,m) blocks for some m > kj contain the nonzero elements of 
D, then 

N(l,2)=ki 

N(2,2)=k2 


Suppose for j=l ,2 . . . , D contains the following columns of the ij-th block: 
mj_i .raj 2. .... raj_„y), with mj j <nij_2<. . .<mj „q). 

Then 

N(l,3)=mi i,N(l,4)=mi 2 , • . . , N(l,n(l)+2)=mi N(l,n(l)+3)=0 

N(2,3)=m2^1, N(2,4)=^2,2’ • • • > N(2,n(2)+2)=j„2,n(2)’ N(2,n(l)+3)=0 


It is important that N(j,n(j)+3) be set to zero, so that the program know the mj ^(j) is the 
last column from the ij-th block column to go into D. The following 3x3 block system 
will illustrate our scheme: 
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X 

X 



X 

X 



X 

X 



X 

X 



X 

X 

X 


X 

X 

X 


X 

X 

X 


X 

X 

X 


X 

X 

X 


X 

X 

X 



N(1 ,1)=1 meaning the first block column contains columns for D 
N(2,l)=2 meaning the second block column contains columns for D 
N(1 ,2)=2 meaning in the first block column, only (m,l),m > 2 contains columns for D 
N(2,2)=3 meaning in the second block column, only (m,2),m >3, contains columns for D 
N(1 ,2)=3,NU ,4)=5,N(1 ,5)=0 meaning only the third and fifth columns of the first block 
column belongs to D 

N(2,3)=3,N(2,4)=0 meaning only the third column of the second block column belongs 
to D 

2. Computation of V^(z,y) (Step 3 of Algorithm II) 

Since V is not stored explicityly, we use the following: 

(Let NB be the number of block columns that contain column of D, L be the total number 
of nonzero columns of D, and the two-dimensional array N, is as defined as before, and 
NC=L+number of right-hand sides.) 
g=l 

For i=l step 1 until NB do 
k=N(i,l) 

Read into core 

For j=3 step 1 until LU+2 do 
m=N(i,j) 

If m=0, then exit do-loop with index j 
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Otherwise 

S(s,h)=Tj^(m,h) 

Enddo 

g=g+l 

Enddo 

T 

Note that the array S contains v^(z,y) 

4.7 NUMERICAL STABILITY OF ALGORITHM II 


Suppose A and B are matrices of order n and are related in the following manner: 


A = B + UV'^ 


( 2 ) 


where U and V are nxr matrices with r and both U and V are of rank r. Then it follows 
from a variant of the well known Sherman-Morrison formula (ref. 6) that 


A-1 = B’‘ + 


(3) 


where Ij. is the identity matrix of order r. Assume B is well-conditioned and has been 
decomposed into LU factors B=LU, then equation (3) provides a very efficient method to 
solve the linear system 


Ax = b. 


(4) 


However, a general concern when using equation (3) is that the matrix ^Ij. +V^B 
may be ill-conditioned. In this section, we prove a sufficient condition for the well condi- 
tioning of ^Ij. + V^B”^cj 

Before we proceed, we state the usually accepted definition of the condition number of a 
matrix and an inequality related to it. 


Let A be an nxm, n>m, matrix with linearly independent columns, then the conditional 

max. ||Ax|| 

«A)= 


number of A, denoted by k(A) is such that 


mm. 
Ilxll = 1 


Axil 
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It is also true that 


( 5 ) 


where A"* 


(aTa)-'aT 


kCAj^lAilj-llAi^ 

the generalized inverse of A. 


Let R(A) be the range of A. Inequality (5) is obvious from the fact that; 


min _ min ||Ax|| _ max ||x|| _ Iroll 

||x ||2 ^ 1 ^ l|x||?^o ||x|| IMIt^o llAxll llAxoll 


llAXol 


< 


max II . 

M=1 


Further, if A is nonsingular, equality will hold. 

We now prove the following: 

Theorem 1 . If A and B are related as in equation (2), and A and B are invertible, then 

(l-V^B'^u) =U'*'AB"lu (6) 

(l - v'^B'^u) = vTb"U + (7) 

where U'*' = (u%) , and = V (v^v) ^ are the 

generalized inverses of U and V, respectively. 

Proof: Note 

A = B-UV”^ 

= (i„-UvTb'1) B 

AB-1 =(ln-UvTB-l) 

.-.AB’lu =(u-UvTb'^u) 

.-. AB'^U = (Uj.-vTb'1u) 

.-. (ij-'V^B'^u) = U'^AB'^U 

Thus we have proved the validity of equation (6). The proof of equation (7) is similar 
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( 8 ) 


The following equations are direct derivations from equations (6) and (7): 

(l- = U + BA"^U 

(l - v'^B’^u) = V^A'^B 

Let «i = (l|u|l2•llu1l2)^:1^2 =(| v'^|| 2*11 )> = llu|l2-llui2-lvi-|(vT)+||2 

Combining equations ( (6), (7), (8), and (9) with the factjthat IICDH2 ^ IICH2IIDII2 
for any two matrices C and D, 

Ic(i-vTb'Iu) = ||l-vV^U 2* (l“'^^®'*u)~^||2 
k(l-vVlu) < min. {fii,L2,C3} k(A) k(B). (10) 

Inequality (10) states that if A and B are well-conditioned, and either U or V is well-conditioned 
then I+V^B~^U is well-conditioned. 

Inequality (10) is useful because IIU 2 I > j|u"^|l2 > I (v^)^|| 

computed quite economically and the physical problems that yield matrices A and B usually 
give some indication of their well-conditioning. The well-conditioning of B and |^I - V^B'^uj 
ensures that the solutions of bx^ = b 

By = U 

(1 + vTb”1u)z = 

can be computed with satisfactory accuracy. Thus the solution of Ax=b via equation (3) 
can be computed with satisfactory accuracy (11). 

T 

In our particular application of the Sherman-Morrison updating formula, V is a permu- 
tation matrix, thus I|v’^ll2' llv|l2 = 1 • Therefore as long as A and B are both well-condi- 
tioned, Algorithm II is always stable. 
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5.0 COMPUTER PROGRAM USAGE 


5.1 MACHINE AND EXECUTION ENVIRONMENT 

The algorithms described in the previous section are programmed as a FORTRAN sub- 
routine library (which we call OCSLIB out-of-core solution library). The subroutines are 
written for the FTN compiler of the CDC computers. 

5.2 OPERATING SYSTEM 

This subroutine library is designed for NOS 1 .1 . Its compass subroutines are optimized for 
the CDC 6600. 


5.3 TIMING 

Timing is hardware and operating system dependent. The following formula gives a very 
rough estimate for the timing; 

CP second = l/2*n*p*k 

where n is the order of the linear system, and p is the band-width and k is machine- 
dependent. To estimate k, make a sample run and compute k using the above formula. 

For the Cyber 175, k s 8 x lO"^ 

5.4 FILES AND FILE FORMAT 

OCSLIB uses at least one random access file. OCSLIB has each block of the coefficient 
matrix as a record. If the block is nj x nj, then the record length is 2*nj*nj + nj + 1 . 

5.5 USAGE 

5.5.1 USING ALGORITHM I (SUBROUTINE ETCGP) 

We recommend the following precedure: 

Step 1 . Define the block system of the coefficient matrix; choose a sequence of positive 
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integers n j , n 2 , . . • to partition the matrix into a block system like fig. 1 of section 
1 .0, so that the block Aj, jis nj x nj. The partition is to be chosen so that at least three 
blocks can be held in core simultaneously. 

Step 2. Define the block profile of the matrix: define an array 
INTERGER INDEX (M,M+1 ) 

such that M> N, and INDEXfi, j)=0 if the j is a zero block. 

INDEX(i, j)^0 if the Aj, jhas at least one nonzero entries. 

(here we consider bj^=Aj , for k=l ,2, . . . , N) 

Step 3. Write the augumented matrix on disk for ETCGP,nj^ (k=l ,2, . . . N) at a time. 

In this step we provide a subroutine ETBMGEN to write a block row of the augmented 
matrix on disk in a format that is acceptable by ETCGP. Thus we shall describe the usage 
and argument list of ETBMGEN. 

COMPLEX W(MXR,MXC) 

INTEGER INDEX(MT,MT+1),JN(NT2),NDBLK(NT1) (where NT1>NTBLKS+1 

NT2>NTBLKS**2 + NTBLKS+3) 


DO 10I=1,NTBLKS 

(Code to generate the i-th block row of the augmented matrix.) 
CALL ETBMGEN(I,NTAPE,IN,MT,NTBLKS,WORK,MXR,MXC) 
10 CONTINUE 
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The argument list for ETBMGEN is described as follows : 


SUBROUTINE ETBMGEN ( I f NTAP E, 1 NDEX ,MT, JNtNTBLK St NDBLK .WORK , 
$MXR,MXC) 


j0c4c*3t(})n»(Tic4e 


c 

C PURPOSE TO WRITE THE NON-ZERO BLOCKS OF A BLOCK ROW OF MATRIX 

C IN ETCGP FORMAT 

C 

C ARGUMENT LIST 

C 

C I - BLOCK ROW INDEX 

C NTAPE - OUTPUT DISK FILE 

C OUTPUT - ARGMENTED MATRIX IN ETCGP FORMAT 

C INDEX - 2 DIMENSIONAL ARRAY FOR THE PROFILE OF THE MATRIX 

C ROW DIMENSION = MT, COLUMN DIMENSION = MT+1 

C‘ INDEX(I,J)=0 (I.J)-TH BLOCK IS ZERO 

C INDEX(I,J)-GT.O LOCATION OF (I,J)-TH BLOCK ON NTAPE 

C MT - ROW DIMENSION OF THE ARRAY INDEX 

C JN - INTEGER ARRAY FOR THE RANDOM ACCESS FILE NTAPE. AT LEAST 

C (NTBLKS+NTBLKS*NTBLKS+3) MANY WORDS LONG 

C NTBLKS - NO. OF BLOCK ROWS 

C NDBLK - ARRAY TO STORE BLOCK SI ZES.NDBLK (NTBLKS+l)=NO.OF RHS 

C WORK - (2 DIMENSIONAL TO USER) 

C INPUT ARRAY FOR THE NON-ZERO BLOCKS OF THE I-TH BLOCK ROW 

C MXR - ROW DIMENSION OF WORK 

C MXC - COLUMN DIMENSION OF WORK 

C 

C NOTE SUBROUTINE ETCGP ASSUMES ALL THE NON-ZEROBLOCKS TO BE 

C DENSE BLOCKS. (SEE THE SAMPLE CALLING PROGRAM) 

C 


ETBMGEN alters the contents of the array INDEX. 


Step 4. To solve the linear system via ETCGP 
COMPLEX WORK (LW) 

INTEGER INDEX(MT,MT ,+ 1 ) ,NDBLK(NT 1 ) 


CALL ETCGP(NTAPE, INDEX, NTBLKS, NDBLK, WORK, LW,ITAG) 

The calling sequence of ETCGP is as follows: 

SUBROUTINE ETCGP(NTAPE,INDEX,NDBLKS, NDBLK, WORK,LWORK,ITAG) 
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SUBROUTINE ETCGP (NT APE ,I NDEX tNTBLKS t NDBLK , WORK, LWORK, ITAG) 
OPTION, KEEP =OFF, INLI ST=ON 


C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


PURPOSE MAIN SUBROUTINE 

REDUCE MATRIX TO AN UPPER TRIAGULAR MATRIX - CALL ETCGPRM 
REPEAT ROW OPERATION ON RHS - CALL ETCGPFS 
BACKWARD SUBSTITUTION - CALL ETCGPBS 

ARGUMENT LIST 

NTAPE - INPUT/OUTPUT DISK FILES FOR MATRIX 
INPUT - ORIGINAL ARGMENTED MATRIX 
OUTPUT - ROW OPERATIONS PERFORMED AND UPPER 

triangular form and solution 

INDEX - 2 DIMENSIONAL ARRAY FOR THE PROFILE OF THE MATRIX 

ROW DIMENSION = NTBLKS, COLUMN DIMENSION = NTBLKS+1 
INDEX(I,J)=0 (I,J)-TH BLOCK IS ZERO 
INDEX( I, J) .GT.O LOCATION OF (I,J)-TH BLOCK ON NTAPE 
NTBLKS - NO. OF BLOCK ROWS 

NDBLK - ARRAY TO STORE BLOCK SI ZES , NDB LK t NTBLKS+1 ) =NO. OF RHS 
WORK - WORKING ARRAY (COMPLEX TO USER) 

LWORK - LENGTH OF THE ARRAY WORK REGARDED AS COMPLEX 

LWORK. GE. (NTBLKS**2 + 3*NTBLKS + 3*MXB + 3*N) 

WHERE MXB = (MXBLK**2) + (MXBLK+2)/2 WITH MXBLK = THE 
MAXIMUM OF ALL THE ENTRIES OF THE ARRAY NDBLK, AND WHERE 
N IS THE ORDER OF THE SYSTEM 
ITAG - COMPUTATIONAL PATH 

ITAG=1 REDUCE MATRIX TO UPPER TRIANGULAR FORM AND 
SOLVE AX=B 

ITAG=2 REDUCE MATRIX ONLY 

ITAG=3 SOLVE AX=B ASSUMING A HAS BEEN REDUCED 


C**********’<f************’lf***«***‘***^*>i‘**’^>>i********#*i******Jlf^<lr*** 


(The array WORK should be equivalenced to the arrays W and JN as follows: 
EQUIVALENCE (WORK,W) (WW(1,MXC+1),JN) 


Step 5 . Read the solution from NTAPE. In this subroutine we provide a subroutine 
ETRDSOL which has exactly the same calling sequence as ETBMGEN. Its usage is described 
below: 

DO 10 1=1, NTBLKS 

CALL ETRDSOL(NTAPE, INDEX, MT,NTBLK,W,MXR,MXC) 

(code to process the part of the solution corresponding to the i-th block row) 

10 CONTINUE 
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5.5.2 USING ALGORITHM II (SUBROUTINE ETCSM) 


We recommend the following procedure: 


Step 1 . Same as step 1 of section 5.5.1 . 


Step 2. Same as step 2 of section 5.5.1 , except call subroutine ETSMGEN 

instead of ETBMGEN. The argument list of ETSMGEN is as follows: 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


SUBROUTINE ETSMGEN ( I «NTA PEt INDEX tMT , JN , NTBLKS , NDBLK, WORK, 

$ MXR,MXC,NB,MLB,LB,LU) 

:4c ^ # 4e ^ 4 4c ^ ^ 

PURPOSE WRITE THE MATRIX IN A FORM ACCEPTABLE BY ETCSM 

ARGUMENT LIST 

I, NTAPE, INDEX, MT,JN,NTBLKS,NDBLK, WORK, MXR.MXC - SEE SUBROUTINE 
ETBMGEN 

NB,MLB,LB,LU GIVE THE SPARSITY STRUCTURE OF THE MATRIX D 
NB - DIMENSIONED AS 

INTEGER NB(MLB,LU+2) 

NB(I,1)=K, THE K-TH BLOCK HOW CONTAINS THE COLUMNS OF D 
NB(I,2)=J, ONLY THE BLOCKS (M,K),M.6E.J CONTAINS THE CDLUWiS 
OF D 

FOR T.GE.3, NB(I,T)=S, THE S-COLUMNS OF THE K-TH BLOCK COLUMN 
IS IN 0. IF 0 ONLY CONTAINS H COLUMNS OF THE K-TH BLOCK COLUMN 
THEN SET NB(I,H+2»=0 
MLB - ROW DIMENSION OF NB 

LB - TOTAL NO. OF BLOCK COLUMNS THAT CONTAINS COLUMNS OF 0 
LU - TOTAL NO. OF NON-ZERO COLUMNS OF D 

♦****j(e+#*****ii*i|i*!4c>)e!j<* 


Step 3. Solve the linear system via ETCSM. The argument list of ITCSM is as follows: 


SUBROUTINE ETCSMINTAPE , INDEX, NTBLKS, NDBLK , WORK , LWORK, ITAG, 

$KNB,MLB,LB,LU) 

C 

C PURPOSE APPLY THE SHERMAN-MORR ISON UPDATING FORMULA TO SOLVE 

C (BE+D)X=B, WHERE BE IS A BLOCKED BENDED MATRIX, 

C AND D ONLY CONSISTS OF A FEW COLUMNS OF NON-ZEROES. 

C THE NON-ZERO COLUMNS OF D IS ASSUMED TO BE STORED WITH THE RHS. 

C 

C ARGUMENT LIST 

C NTAPE, INDEX, NTBLKS, NDBLK, WORK, LWORK - SEE SUBROUTINE ETCGP 

C ITAG - THE VALUE OF ITAG IS PASSED ON TO ETCGP 

C KNB,MLB,LB,LU GIVE THE SPARSITY STRUCTURE OF THE MATRIX D 

C KNB - DIMENSIONED AS 

C INTEGER KNB<MLB,LU+2) 

C KNB(I,11=K, THE K-TH BLOCK ROW CONTAINS THE COLUMNS OF D 

C KNB(I,2)=J, ONLY THE BLOCKS (M,K),M.GE.J CONTAINS THE COLUMNS 

C OF D 

C FOR T.GE.3, KNB(I,T)=S, THE S-COLUMNS OF THE K-TH BLOCK COLUMN 

C IS IN 0. IF D ONLY CONTAINS H COLUMNS OF THE K-TH BLOCK COLUMN 

C THEN SET KNB{I,H+2>=0 

C MLB - ROW DIMENSION OF KNB 

C LB - TOTAL NO. OF BLOCK COLUMNS THAT CONTAINS COLUMNS OF D 

C LU - TOTAL NO. OF NON-ZERO COLUMNS OF D. 

C 

C^^titt***#***#*****^****************#*******^*********^:*******#***#**** 
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5.5.3 USING THE ITERATIVE REFINEMENT SUBROUTINE (ETCIT) 


ETCIT requires the original matrix and the output matrix from TECGP to be in different 
files. The usage of ETCIT is obvious from the explanation of its argument list; 


SUBROUTINE ETC ITCNTAPE I , INDEXI > NTAPEO t INDEXO ,NDBLKt NTBLKSt 
$WORK,LWORK) 

C’tf**********************************!!!*^******:***************** 


C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


PURPOSE ITERATIVE REFINEMENT 
ARGUMENT LIST 

NTAPEI - INPUT DISK FILE STORES ORIGINAL MATRIX 
INDEXI - 2 DIMENSIONAL ARRAY FOR THE PROFILE OF THE MATRIX IN 
NTAPEI (SEE EXPLANATION FOR THE ARGUMENT INDEX IN 
SUBROUTINE ETCGP) 

NTAPEO - OUTPUT DISK FILE FROM ETCGP, STORES THE UPPER TRIANGULAR FORM 
AND THE ROW OPERATIONS PERFORMED BY ETCGP, ALSO THE 
SOLUTION FROM ETCGP. 

THE FINAL SOLUTION FROM ETCIT WILL OVERWRITES THE SOLUTION 
FROM ETCGP. 

INDEXO - 2 DIMENSIONAL ARRAY FOR THE PROFILE OF THE MATRIX IN 
NTAPEO (SEE EXPLANATION FOR THE ARGUMENT INDEX IN 
SUBROUTINE ETCGP) 

NTBLKS - NO. OF BLOCK ROWS 

NDBLK - ARRAY TO STORE BLOCK SIZES, NDBLK(NTBLKS+l)=NO. OF RHS 
WORK - WORKING ARRAY (COMPLEX TO USER) 

LWORK - LENGTH OF THE ARRAY WORK REGARDED AS COMPLEX 

LWORK.GE. (NTBLKS*>»2 + 3*NTBLKS ♦ 3*MXB + 3*N) 

WHERE MXB = (MXBLK**2) ♦ (MXBLK+2)/2 WITH MXBLK = THE 
MAXIMUM OF ALL THE ENTRIES OF THE ARRAY NDBLK, AND WHERE 
N IS THE ORDER OF THE SYSTEM 
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5.6 ERROR MESSAGES 


5.6.1 ERROR MESSAGES FROM ETCGP 

1 . WORKING SPACE TOO SMALL, AT LEAST ************* WORDS ARE 
NEEDED RETURN FROM ETCGP. 

2. WRONG CHOICE OF COMPUTATION PATH, ITAG SHOULD EQUAL TO 1 ,2 OR 3. 

3 . MATRIX SEEMED SINGULAR, EXIT FROM ETCGP. 

5.6.2 ERROR MESSAGES FROM ETCSM 

1 . WO RKING SPACE TOO SMALL, AT LEAST *************WORDS ARE 
NEEDED RETURN FROM ETCSM. 

2. MATRIX SEEMED SINGULAR, EXIT FROM ETCSM. 

5.6.3 ERROR MESSAGES FROM ETCIT 

1 . WORKING SPACE TOO SMALL, AT LEAST ************* WORDS ARE 
NEEDED RETURN FROM ETCIT. 

2. CONVERGENCE TOO SLOW, RETURN FROM ETCIT. 
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5.7 SAMPLE PROBLEMS 


5.7.1 SAMPLE PROBLEM 1 

PROGRAM TGP( INPUT, OUTPUT, TAP£5=INPUT, TAPE6=0UTPUT, TAPES 


THIS PROGRAM ILLUSTRATES THE USE OF ETCGP. 

THE PROFILE OF THE MTRIX IS BLOCK TRI ADI AGO N AL . 

THE SUBBLOCKS ARE 2X2 BLOCKS. 

WE CALL THE RANDOM NUMBER GENERATOR TO GENERATE THE MATRIX ENTRIES 
WE USE THE ROW SUM OF THE MATRIX AS OUR RHS. 


COMPLEX W(IOOO) ,WW(10,100) 

INTEGER IN(10,11) ,NDBLK(11),JN(115) ,KN(110) ,NC(10) 

NOTE THE W ARRAY SHOULD BE EQUIVALENCED TO WW 
AND JN SHOULD BE EQUIVALENCED TO WW(1,MXC+1) 
EQUIVALENCE (WW,W), ( KN, IN) , ( JN, WW( 1 , 12) ) 

DATA KN/110*0/,W/1000*(0.,0.)/ 

DATA NTBLKS/5/, NDBLK/2, 2, 2, 2, 2, 1/ 

NOTE THE W ARRAY SHOULD BE EQUIVALENCED TO WW 
AND JN SHOULD BE EQUIVALENCED TO WW(1,MXC+1) 
EQUIVALENCE ( WW, W) , ( KN, IN) , ( JN, WW( 1 , 12 ) ) 

DATA 

GENERATE INDEX ARRAY IN IN 

NTl =NTBLKS 
NT1=NTLBKS+1 
NTM1=NTBLKS-1 
IN(1,1)=1 
IN(1,2)=2 
IN(1,NT1)=3 
NS=3 

D03I=2,NTM1 
IN(I,I-1)=NS+1 
IN(I,I)=NS+2 
IN(l,I-fl)=NS+3 
IN(I,NTl)=NS+4 
NS=NS+4 
3 CONTINUE 

IN(NTBLKS,NTM1) =NS+1 
I N ( NTBLKS , NTBLKS ) =NS +2 
IN( NTBLKS, NTl ) =NS+3 
NS=NS+3 
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FIND OUT HOW MANY NON-ZERO COLUMNS IN THE BLOCK ROW 


D04I=1,NTBLKS 

NC(I)=0 

D04J*=1,NTBLKS 

4 IF (IN(I, J) .GT.O)NC(I)s=NC(I)+NDBLK(J) 

WRITE(6,100) ( CIN(I, J) ,J=1, 11) ,1=1,10) 

100 FORMAT(*INDEX ARRAY IN »*,/, (1115)) 

GENERATE NTBLKS BLOCK ROW OF THE MATRIX 

D010I=1,NTBLKS 

NC1=NC(I)+1 

GENERATE NC(I)*20 RANDOM NUMBERS 
CALL N0GEN(W,NC(I)*20) 

ZERO OUT WW(.,NC1)F0R ROW SUM 

ND=NDBLK(I) 

COMPUTE HOW SUM 
MC=NC(I) 

D06J=1,ND 

WW(J,NC1)=(0.,0.) 

D06K=1,MC 

6 WW(J,NC1)=WW(J,NC1)+WW(J,K) 

WRITE I-TH BLOCK ROW FOR ETCGP 

CALL ETBMGEN (1 , 8, IN, 10, JN, NTBLKS, NDBLK, WW, 10, 11 ) 
10 CONTINUE 

CALL ETCGP (8, IN, NTBLKS, NDBLK, W, 1000,1) 

TO RED SOLUTION FROM TAPE8 
DOl 11=1, NTBLKS 

CALL ETRDSOL (I , 8, IN, 10, JN, NTBLKS, NDBLK, WW, 10, 11 ) 
ND=NDBLK(I) 

11; WRITE(6, 101)1, (WW(J,1),J»1,ND) 

101 F0RMAT(1X,I5,*-TH BLOCK SOLUTION*, ,/,(8E10. 4) ) 
STOP 
END 
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5.7.2 SAMPLE PROBLEM 2» 


PROGRAM TSM (INPUT, OUTPUT, TAPE 5 =INPUT, TAPE6=0UTPUT, TAPES) 

THIS PROGRAM ILLUSTRATES THE USE OF ETCSM. 

THE PROFILE OF THE MATRIX IS OF THE FORM A=B+D, WHERE B IS 
BLOCK TRIDIANGONAL, AND D ONLY CONSISTS OF 2 NON-ZERO ROWS. 

THE SUBBLOCKS ARE 2X2 BLOCKS. 

WE CALL THE RANDOM NUMBER GENERATOR TO GENERATE THE MATRIX ENTRIES 
WE USE THE ROW SUM OF THE MATRIX AS OUR RHS. 

COMPLEX W( 1000), WW( 10, 100) 

INTEGER IN(10,11),NDBLK(11),JN(115),KN(110),NC(10),NB(2,5) 

NOTE THE W ARRAY SHOULD BE EQUIVALENCED TO WW 
AND JN SHOULD BE EQUIVALENCED TO WW(1,MXC+1) 

EQUIVALENCE (WW,W), (KN,IN) , ( JN, WW( 1, 12) ) 

DATA KN/110*0/,W/1000*(0.,0.)/ 

DATA NTBLKS/5/, NDBLK/2 , 2 , 2 , 2 , 2 , 1/ 

GENERATE INDEX ARRAY IN IN 

NTl =NTBLKS 

NT1=NTBLKS+1 

NTM1= NTBLKS-1 

IN(l,l)i=l 

IN(1,2)=2 

IN(1,NT1)=3 

NS =3 

D03I=2,NTM1 
IN(I,I-1)=NS+1 
INU,I) =NS+2 
IN(I,I+l)=NS+3 
IN(I,NTl)=NS+4 
NS=NS+4 
3 CONTINUE 

IN( NTBLKS, NTMl ) =NS+1 
IN( NTBLKS, NTBLKS) =NS+2 
IN( NTBLKS, NTl ) =NS+3 
NS=NS+3 

DO 51 =4, NTBLKS 
NS=NS+1 
5 IN(I,2)=NS 

DEFINE THE STRUCTURE OF D 

NB(1,1)=2 

NB(1,2)=4 

NB(1,3)=1 

NB(1,4)=2 

NB(1,5)=0 

FIND OUT HOW MANY NON-ZERO COLUMNS IN THE BLOCK ROW 
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D04I =1 , NTBLKS 
NC(I)=0 
D04J=1, NTBLKS 

4 IF (IN(I,J) .GT.O)NC(l)=NC(I)+NDBLK(J) 

C 

WRITE(6,100)((IN(I,J),J=1,11),I=1,10) 

100 FORMAT(*INDEX ARRAY INi *, /, ( 111 5) ) 

GENERATE NTBLKS BLOCK ROW OF THE MATRIX 

D010I=1, NTBLKS 
NClsNCCD+l 

GENERATE NC(I)*20 RANDOM NUMBERS 
CALL N0GEN(W, NC(I)*20) 

ZERO OUT WW( . , NCI ) FOR ROW SUM 

ND=NDBLK(I) 

COMPUTE ROW SUM 
MC=NC(I) 

D06J=1,ND 

WW(J,NC1)=(0.,9*) 

D06K»1,MC 

6 WW(J,NC1)=WW(J,NC1)^WW(J,K) 

WRITE I-TH BLOCK ROW FOR ETCGP 

CALL ETSMGEN ( I , 8, IN, 10, JN, NTBLKS, NDBLK, WW, 10, 11 , NB, 2, 1 , 2) 

10 CONTINUE 

CALL ETCSM(8,IN,NTBLKS,NDBLK,W,1000,1,NB,2,1,2) 

TO READ SOLUTION FROM TAPE8 
DO 111=1, NTBLKS 

CALL ETRDSOL (1 , 8, IN, 10, JN, NTBLKS, NDBLK, WW, 10, 11 ) 
ND=NDBLK(I) 

11 WRITE(6, 101)1, WW(J,1),J=1,ND) 

101 F0RMAT(1X,I5,*-TH BLOCK SOLUTION*, ,/,( 8E10. 4) ) 

STOP 
END 
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